COMPILED FUNCTION TOOLS TUTORIAL

Parallel Computation

The Mathematica compiler can run computations in parallel. It does this by threading a compiled function over lists of data in parallel. A first step is to create a compiled function with the Listable attribute.

In[11]:=
Click for copyable input
Out[11]=

When the input matches the type specification of the compiled function, it works normally. In the following example, the real number input matches the type of the compiled function and the function executes.

In[18]:=
Click for copyable input
Out[18]=

Here the compiled function receives a list input; since this is higher rank than the input, it threads over the input.

In[19]:=
Click for copyable input
Out[19]=

A listable compiled function can also run in parallel. This is done by setting the Parallelization option to True.

In[20]:=
Click for copyable input
Out[20]=

When the input matches the type specification of the compiled function, it works normally.

In[22]:=
Click for copyable input
Out[22]=

Here the compiled function receives a list input; since this is higher rank than the input, it threads over the input. It also runs this in parallel.

In[23]:=
Click for copyable input
Out[23]=

Computation Speedup

A main reason for using parallelization is to speed up the computation. This works well in many cases.

The following creates and runs a compiled function that runs sequentially.

In[1]:=
Click for copyable input
Out[3]=

This creates an equivalent compiled function that will run in parallel. On this machine it runs twice as fast.

In[4]:=
Click for copyable input
Out[5]=

The number of parallel threads used to run the compiled function defaults to the setting of $ProcessorCount.

In[6]:=
Click for copyable input
Out[6]=

CompilationTarget

You can get further acceleration for the parallel compiled function by setting CompilationTarget to C.

In[7]:=
Click for copyable input
Out[8]=

External Calls

An important issue for parallel computation is how it handles two different threads trying to do the same thing. This is handled automatically for parallel computation with compiled functions.

The following function definition also increments a counter.

In[1]:=
Click for copyable input

This parallel compiled function calls the external function.

In[3]:=
Click for copyable input

This executes the compiled function in parallel over a list of input data.

In[4]:=
Click for copyable input
Out[4]=

The counter has been incremented the correct number of times.

In[5]:=
Click for copyable input
Out[5]=

In fact, when a compiled function running in parallel makes an external call, this will always be done with synchronization primitives so that only one of the threads can actually make the call at any moment in time. This means that if a parallel compiled function makes many external calls, you will not get a good parallel acceleration.

Random Numbers

Many computations with random numbers, such as Monte Carlo methods, can be sped up significantly when done in parallel. However, fast and effective use of random numbers in parallel requires having generators on separate execution threads that operate independently and generate random numbers that are statistically independent of the numbers generated on other threads. For this reason, the random generators that Mathematica uses by default for parallel computations are different from the one used for serial computation, and so the actual random numbers will necessarily be different. Further compounding the issue is that for a given parallel computation, the execution thread assigned to a particular part of the computation may be different in different runs, so results may vary from run to run even with the same initial random state.

The following creates two CompiledFunction objects that simulate a random walk of n steps on a lattice, one that runs serially and one that runs in parallel for multiple simulations.

In[11]:=
Click for copyable input

The serial one gives the same result every time when run within BlockRandom.

In[12]:=
Click for copyable input
Out[12]=
In[13]:=
Click for copyable input
Out[13]=

However, run in parallel, the results are different.

In[15]:=
Click for copyable input
Out[15]=
In[16]:=
Click for copyable input
Out[16]=

Much of the difference is in the ordering of results. You can see that many are actually the same using Intersection.

In[19]:=
Click for copyable input
Out[19]=

The ones that are the same were run on the same execution thread each time.

Typically when you use BlockRandom or SeedRandom you will want to do it outside the parallel computation. If you want to use these commands inside a parallel evaluation they will only affect the current execution thread, and there are subtleties described in "SeedRandom and BlockRandom in Parallel Computations" in "Random Number Generation" that it is good to be aware of.

You can change the random generators for parallel computation using SeedRandom[seed, Method->"ParallelGenerator"]. The default random generators for parallel computation are a set of 1024 Mersenne Twister generators that produce good quality random numbers. Mathematica uses a different one of these generators for each execution thread in a parallel computation. An important feature of these is that each one produces random numbers independent from the numbers on the others, so there should be no correlations between computations done on different execution threads. A demonstration of this is to do a standard test of random numbers, called the blocking test.

The blocking test checks that sample means of numbers generated from a given distribution converge to the normal distribution as they should by the central limit theorem. If they do not converge, this is an indication of a possible problem with the distribution.

The following defines a CompiledFunction that will get the sample mean from n integers that are equally likely to be 0 or 1, and will run in parallel when given a list argument.

In[20]:=
Click for copyable input

Using this, define functions that generate m sample means of n bits in serial and in parallel.

In[21]:=
Click for copyable input

From the central limit theorem, the sample sums should be distributed normally with the same mean and standard deviation .

In[23]:=
Click for copyable input

Now get a set of sample sums in serial and parallel.

In[24]:=
Click for copyable input
Out[25]=

Note that the parallel computation is significantly faster.

Show the histograms of the serial and parallel data compared with the PDF of the expected distribution.

In[27]:=
Click for copyable input
Out[27]=

It is hard to tell visually if one is better than another. A better way is to use DistributionFitTest to get the -value for the goodness-of-fit hypothesis test with null hypothesis that the data has the same distribution as the expected distribution and alternative hypothesis that it does not.

In[26]:=
Click for copyable input
Out[26]=

The test statistic of DistributionFitTest has a distribution itself that should be the (continuous) uniform distribution, so the best information comes from comparing the parallel and serial over many runs.

In[28]:=
Click for copyable input
Out[28]=
In[29]:=
Click for copyable input
Out[29]=

Parallel Controls

Compiled functions execute in parallel using multiple threads of execution. The number of threads is set initially to be $ProcessorCount.

In[1]:=
Click for copyable input
Out[1]=

The actual setting can be modified with SystemOptions using the suboption of .

In[2]:=
Click for copyable input
Out[2]=

This demonstrates a parallel compiled function.

In[3]:=
Click for copyable input
Out[5]=

Now the number of threads is set to 1, which forces the compiled function to run sequentially.

In[6]:=
Click for copyable input
Out[6]=

The time to run is the same as if Parallelization of False were set.

In[7]:=
Click for copyable input
Out[7]=
New to Mathematica? Find your learning path »
Have a question? Ask support »