This is documentation for an obsolete product.
Current products and services
Previous section-----Next section


A remote kernel in use may fail at any time, due to hardware, network, or software problems. A failure of a remote kernel will be noticed the next time Parallel Computing Toolkit tries to send a command to the kernel or tries to read a result from it. The error message Parallel::rdead is used to notify you of a failed remote kernel.
If the failed kernel had any processes assigned to it, these processes will be lost. If you are using Wait for one of these processes, your program will never terminate because the process will never return.
Because Parallel Computing Toolkit keeps track of the commands submitted to remote kernels, it can reassign these commands to another available remote kernel if a remote kernel fails. Alternatively, it may simply terminate the waiting processes with the result $Failed, which indicates failure. The chosen behavior is determined by the value of the variable $RecoveryMode.
$RecoveryModegives the current setting of the failure recovery mode
$RecoveryMode = Nonedoes not perform any failure recovery
$RecoveryMode = Abandonlets processes assigned to a failed kernel return with result $Failed (default)
$RecoveryMode = ReQueuereassigns processes on the failed kernel to another kernel

Possible failure recovery modes.

The ReQueue recovery mode lets you finish a computation as long as at least one kernel remains usable. However, it may give wrong results if the remote computations produce side effects or your computation depends on a certain number of available remote kernels. Side effects are usually present if you use virtual shared memory. There is also the possibility of a deadlock if a process on a failed kernel acquired, but never released, a shared resource.
You can use the Abandon recovery mode to implement your own failure recovery method.
Failure recovery affects only processes started with Queue[] and collected with Wait[]. Other parallel commands, such as ParallelEvaluate[], cannot handle a failed remote kernel and always return $Failed in such cases.