The Wolfram System compiler can run computations in parallel. It does this by threading a compiled function over lists of data in parallel. A first step is to create a compiled function with the Listable attribute.
When the input matches the type specification of the compiled function, it works normally. In the following example, the real number input matches the type of the compiled function and the function executes.
The number of parallel threads used to run the compiled function defaults to the setting of $ProcessorCount.
You can get further acceleration for the parallel compiled function by setting CompilationTarget to C.
In fact, when a compiled function running in parallel makes an external call, this will always be done with synchronization primitives so that only one of the threads can actually make the call at any moment in time. This means that if a parallel compiled function makes many external calls, you will not get a good parallel acceleration.
Many computations with random numbers, such as Monte Carlo methods, can be sped up significantly when done in parallel. However, fast and effective use of random numbers in parallel requires having generators on separate execution threads that operate independently and generate random numbers that are statistically independent of the numbers generated on other threads. For this reason, the random generators that the Wolfram Language uses by default for parallel computations are different from the one used for serial computation, and so the actual random numbers will necessarily be different. Further compounding the issue is that for a given parallel computation, the execution thread assigned to a particular part of the computation may be different in different runs, so results may vary from run to run even with the same initial random state.
The following creates two CompiledFunction objects that simulate a random walk of n steps on a lattice, one that runs serially and one that runs in parallel for multiple simulations.
The serial one gives the same result every time when run within BlockRandom.
Much of the difference is in the ordering of results. You can see that many are actually the same using Intersection.
Typically when you use BlockRandom or SeedRandom you will want to do it outside the parallel computation. If you want to use these commands inside a parallel evaluation they will only affect the current execution thread, and there are subtleties described in "SeedRandom and BlockRandom in Parallel Computations" in "Random Number Generation" that it is good to be aware of.
You can change the random generators for parallel computation using SeedRandom[seed,Method->"ParallelGenerator"]. The default random generators for parallel computation are a set of 1024 Mersenne Twister generators that produce good quality random numbers. The Wolfram Language uses a different one of these generators for each execution thread in a parallel computation. An important feature of these is that each one produces random numbers independent from the numbers on the others, so there should be no correlations between computations done on different execution threads. A demonstration of this is to do a standard test of random numbers, called the blocking test.
The blocking test checks that sample means of numbers generated from a given distribution converge to the normal distribution as they should by the central limit theorem. If they do not converge, this is an indication of a possible problem with the distribution.
The following defines a CompiledFunction that will get the sample mean from n integers that are equally likely to be 0 or 1, and will run in parallel when given a list argument.
It is hard to tell visually if one is better than another. A better way is to use DistributionFitTest to get the -value for the goodness-of-fit hypothesis test with null hypothesis that the data has the same distribution as the expected distribution and alternative hypothesis that it does not.
The test statistic of DistributionFitTest has a distribution itself that should be the (continuous) uniform distribution, so the best information comes from comparing the parallel and serial over many runs.
Compiled functions execute in parallel using multiple threads of execution. The number of threads is set initially to be $ProcessorCount.
The actual setting can be modified with SystemOptions using the suboption of .