PARALLEL PACKAGE TUTORIAL

Concurrency: Managing Parallel Processes
This feature is not supported on the Wolfram Cloud.

Processes and Processors

A process is simply a Wolfram Language expression being evaluated. A processor is a parallel kernel that performs such evaluations.

The command ParallelEvaluate discussed in the tutorial "Parallel Evaluation" will send an evaluation to an explicitly given processor, requiring you to keep track of available processors and processes yourself. The scheduling functions discussed in this tutorial perform these functions for you. You can create any number of processes, many more than the number of available processors. If more processes are created than there are processors, the remaining processes will be queued and serviced when a processor becomes available.

Starting and Waiting for Processes

The two basic commands are ParallelSubmit[expr] to line up an expression for evaluation on any available processor, and WaitAll[pid] to wait until a given process has finished.

Each process in the queue is identified by its unique evaluation ID, or eid.

ParallelSubmit[cmd]submits cmd for evaluation on a remote kernel and returns the queued job's eid
ParallelSubmit[{vars },cmd]builds a closure for the local values of the given variables before sending cmd to a remote kernel
WaitAll[eid]waits for the given process to finish and returns its result
WaitAll[{eid1,eid2,}]waits for all given processes and returns the list of results
WaitAll[expr]waits for all evaluation IDs contained in expr to finish and replaces them by the respective process' result
WaitNext[{eid1,eid2,}]waits for one of the given processes to finish. It returns , where id is the pid of the finished process, res is its result, and ids is the list of remaining eids

Queuing processes.

WaitNext is nondeterministic. It returns an arbitrary process that has finished. If no process has finished, it waits until a result is available. The third element of the result of WaitNext is suitable as an argument of another call to WaitNext.

The functions ParallelSubmit and WaitAll implement concurrency. You can start arbitrarily many processes, and they will all be evaluated eventually on any available remote processors. When you need a particular result, you can wait for any particular pid, or you can wait for all results using repeated calls to WaitNext.

Basic Usage

To try the examples here, start a few parallel kernels.

In[14]:=
Click for copyable input
Out[14]=

Queue the evaluation for processing on a remote kernel. Note that has the attribute to prevent evaluation of the expression before queuing. The value returned by is the process ID (pid) of the queued process.

In[2]:=
Click for copyable input
Out[2]=

After queuing a process, you can perform other calculations and eventually collect the result. If the result is not yet available, will wait for it.

In[3]:=
Click for copyable input
Out[3]=

You can queue several processes. Here, the expression , , , is queued for evaluation.

In[4]:=
Click for copyable input
Out[4]=

Next, wait for any one process to finish.

In[5]:=
Click for copyable input
Out[5]=

Note the reassignment of to the list of remaining process IDs. Repeating the previous evaluation until the list becomes empty allows you to drain the queue.

In[6]:=
Click for copyable input
Out[6]=

You can also wait for all of the remaining processes to finish.

In[7]:=
Click for copyable input
Out[7]=

A Note on the Use of Variables

If an expression e in ParallelSubmit[e] involves variables with assigned values, care must be taken to ensure that the remote kernels have the same variable values defined. Unless you use DistributeDefinitions or shared variables, locally defined variables will not be available to remote kernels. See Values of Variables in the tutorial "Parallel Evaluation" for more information.

Here are a few common cases where there may be problems.

This assigns the value to the variable in the local master kernel.

In[1]:=
Click for copyable input

You want to evaluate the expression Head[a] on a remote kernel. The result is not , as it is on the local kernel, because on the remote kernel the symbol does not have a value.

In[2]:=
Click for copyable input
Out[2]=
In[3]:=
Click for copyable input
Out[3]=

You can use a local constant, and even name it , to textually insert the value of into the argument of ParallelSubmit.

In[4]:=
Click for copyable input
Out[4]=

To make this frequent case simpler, you can use an optional first argument in ParallelSubmit to declare the variables. The variables will then be inserted into the expression in the second argument.

In[5]:=
Click for copyable input
Out[5]=

The syntax ParallelSubmit[{vars },expr] is by design similar to Function[{vars },expr]. Both form closures where the variables are bound to their values.

Iterator variables behave in the same way. In the following two outputs, the parallel evaluation does not give the correct result.

In[6]:=
Click for copyable input
Out[6]=
In[7]:=
Click for copyable input
Out[7]=

Insert the iterator variable as a constant or declare a closure to ensure that you are getting correct results, as is done with the following command.

In[8]:=
Click for copyable input
Out[8]=

Note that ParallelTable[] treats the iterator variable correctly.

In[9]:=
Click for copyable input
Out[9]=

Lower-Level Functions

The context contains functions to control the process queue.

$Queuethe list of evaluations submitted with ParallelSubmit but not yet assigned to an available kernel
$QueueLengthgives the length of the input queue
ResetQueues[]waits for all running processes and abandons any queued processes
QueueRun[]collects finished evaluations from all kernels and assign new ones from the queue

Evaluation queue control.

returns True if at least one evaluation was submitted to a kernel or one result received from a kernel and False otherwise. Normally, you should not have to run yourself. It is called at appropriate places inside WaitAll and other functions. You need it only if you implement your own main loop for a concurrent program.

Access the developer functions.
In[22]:=
Click for copyable input
Submit a number of evaluations.
In[34]:=
Click for copyable input
Out[34]=
Each time you invoke , you can see a change in the state of the evaluations displayed above.
In[35]:=
Click for copyable input
Out[35]=
In[36]:=
Click for copyable input
Out[36]=
This is the number of evaluations still waiting to be serviced.
In[37]:=
Click for copyable input
Out[37]=
Finally, let them run to completion.
In[38]:=
Click for copyable input
Out[38]=

Working with Process IDs

WaitAll[{pid1,pid2,}] is merely a simple form of a general mechanism to parallelize computations. WaitAll can take any expression containing pids in its arguments and will wait for all associated processes to finish. The pids will then be replaced by the results of their processes.

You can view WaitAll as the inverse of ParallelSubmit; that is, WaitAll[ParallelSubmit[expr]] gives expr, evaluated on a remote kernel, just as expr itself is evaluated locally. Further, WaitAll[ ParallelSubmit[e1] ParallelSubmit[en]] is equivalent to , where each is evaluated in parallel. Here the ellipses represent an arbitrary surrounding computation.

The pids generated by an instance of ParallelSubmit should be left intact and should neither be destroyed nor duplicated before WaitAll performs its task. The reason is that each of them represents an ongoing parallel computation whose result should be collected exactly once.

Examples of expressions that leave pids intact follow.

    • Pids in a list are safe, because the list operation does not do anything to its arguments; it merely keeps them together. Nested lists are also safe for the same reason.
    • Pids are symbolic objects that are not affected by Plus. They may be reordered, which is irrelevant. Most arithmetic operations are safe.
    • Mapping a function involving ParallelSubmit onto a list is safe because the result will contain the list of the pids.
    • Table returns lists of pids and is, therefore, safe.

    Examples of expressions that are not safe include the following.

    • The Head operation will destroy the symbolic pid, as will other structural operations such as Length, ByteCount, and so on.
    • Multiplying a pid by 0 will destroy it.
    • Do does not return anything, so all pids are lost. A similar case is Scan.

    To recover from a computation where pids were destroyed or duplicated, use the command AbortKernels[].

    ProcessID[pid]a unique integer identifying the process
    Process[pid]the expression representing the process
    Scheduling[pid]the priority assigned to the process
    ProcessState[pid]the state of the process: queued, running, finished

    Properties of process IDs (in the developer context).

    In[15]:=
    Click for copyable input
    Here, several processes are queued.
    In[16]:=
    Click for copyable input
    Out[16]=
    Because the scheduler has not yet been running, all of them are still queued for evaluation.
    In[17]:=
    Click for copyable input
    Out[17]//TableForm=
    To demonstrate how it works, invoke the scheduler by hand.
    In[18]:=
    Click for copyable input
    Out[18]=

    Now some processes are running on the available processors; some may already have finished. Note that some of this information is also available in the default output form of evaluations.

    In[19]:=
    Click for copyable input
    Out[19]//TableForm=

    WaitAll[] invokes the scheduler until all processes are finished and returns their results. Note that the priorities are not used with the default queue type.

    In[20]:=
    Click for copyable input
    Out[20]=

Examples

Before evaluating these examples, make sure that you can run parallel kernels.
In[55]:=
Click for copyable input
Out[55]=

An Infinite Calculation

If you want to verify that the polynomials for are all irreducible (an open conjecture), you can test them with IrreduciblePolynomialQ.

This computation will continue forever. To stop it, abort the local evaluation by pressing . or choosing Kernel Abort Evaluation. After the abort, collect any waiting results as follows.

Here is the definition of the polynomial in with degree .
In[56]:=
Click for copyable input
You then make this definition known to all remote kernels.
In[57]:=
Click for copyable input
Now you can start the computation, requiring that it print each result as it goes on. To stop the computation, abort it by choosing Kernel Abort Evaluation or pressing .. The explicit call of is necessary in such examples where you program your own scheduling details.
In[58]:=
Click for copyable input
Out[59]=
Do not forget to collect the orphaned processes after an interrupt (or use AbortKernels[].)
In[60]:=
Click for copyable input
Out[60]=

Automatic Process Generation

A general way to parallelize many kinds of computation is to replace a function g occurring in a functional operation by Composition[ParallelSubmit,g]. This new operation will cause all instances of calls of the function g to be queued for parallel evaluation. The result of this composition is a process ID (pid) that will appear inside the structure constructed by the outer computation where g occurred. To put back the results of the computation of g, wrap the whole expression in WaitAll. This will replace any pid inside its expression by the result returned by the corresponding process.

Here are a few examples of such functional compositions.

Parallel Mapping

A parallel version of Map is easy to develop. The sequential Map wraps a function around all elements in a list.

In[1]:=
Click for copyable input
Out[1]=

Simply use Composition[ParallelSubmit,f] instead of to schedule all mappings for parallel execution. The result is a list of pids.

In[2]:=
Click for copyable input
Out[2]=

Finally, simply wait for the processes to finish. Every pid will be replaced with the result of its associated process.

In[3]:=
Click for copyable input
Out[3]=

Parallel Inner Products

To see how this works for symbolic inner products, assume you want a generalized inner product where is the common last dimension of and first dimension of . Think of as Plus and as Times.

Here is an example with .

In[4]:=
Click for copyable input
Out[4]=

You can use Composition[ParallelSubmit,p] in place of in the previous expression to queue all calculations of for concurrent execution. The result is a tensor of process IDs.

In[5]:=
Click for copyable input
Out[5]=

Now, simply wait for all processes in this expression.

In[6]:=
Click for copyable input
Out[6]=

Parallel Tables and Sums

This code generates a 55 matrix of random numbers, where each row is computed in parallel.

In[7]:=
Click for copyable input
Out[7]=

Here is a sum whose elements are computed in parallel. Each element of the sum is a numerical integration.

In[8]:=
Click for copyable input
Out[8]=

Here is the corresponding table of parallel exact results. The value of is .

In[9]:=
Click for copyable input
Out[9]//TableForm=

Comparison with Parallelize

Parallel mapping, tables, and inner products were already introduced in "Parallel Evaluation". Those functions divide the task into batches of several subproblems each, under control of the method option. The functions in this section generate one evaluation for each subproblem. This division is equivalent to the setting .

If all subproblems take the same amount of time, the functions such as ParallelMap[] and ParallelTable[] are faster. However, if the computation times of the subproblems are different, and not easy to estimate in advance, it can be better to use WaitAll[ ParallelSubmit[]] as described in this section or use the equivalent method option setting. If the number of processes generated is larger than the number of remote kernels, this method performs automatic load balancing, because jobs are assigned to a kernel as soon as the previous job is done, and all kernels are kept busy all the time.