Random Number Generation
Introduction
The ability to generate pseudorandom numbers is important for simulating events, estimating probabilities and other quantities, making randomized assignments or selections, and numerically testing symbolic results. Such applications may require uniformly distributed numbers, nonuniformly distributed numbers, elements sampled with replacement, or elements sampled without replacement.
The functions
RandomReal,
RandomInteger, and
RandomComplex generate uniformly distributed random numbers.
RandomReal and
RandomInteger also generate numbers for built-in distributions.
RandomPrime generates primes within a range. The functions
RandomChoice and
RandomSample sample from a list of values with or without replacement. The elements may have equal or unequal weights. A framework is also included for defining additional methods and distributions for random number generation.
A sequence of nonrecurring events can be simulated via
RandomSample. For instance the probability of randomly sampling the integers 1 through
n in order might be simulated.
This estimates the probability of getting n elements in order for n from 2 to 8.
| Out[1]= |  |
|
The results can be compared with the theoretical probabilities.
| Out[2]= |  |
|
Random number generation is at the heart of Monte Carlo estimates. An estimate of an expected value of a function
f can be obtained by generating values from the desired distribution and finding the mean of
f applied to those values.
This estimates the 6 th raw moment for a normal distribution.
| Out[3]= |  |
|
In this case, the estimate can be compared with an exact result.
| Out[4]= |  |
|
Random processes can be simulated by generating a series of numbers with the desired properties. A random walk can be created by recursively summing pseudorandom numbers.
Here a random walk starting at 0 is created.
| Out[5]= |  |
|
Substitution of random numbers can be used to test the equivalence of symbolic expressions. For instance, the absolute difference between two expressions could be evaluated at randomly generated points to test for inequality of the expressions.
This provides no evidence that  and x are different for real values.
| Out[6]= |  |
|
This provides evidence that  and x differ for at least some complex values.
| Out[7]= |  |
|
RandomPrime chooses prime numbers with equal probability, which can be useful—for instance, to generate large primes for RSA encryption. The prime numbers are uniformly distributed on the primes in the range but are not uniformly distributed on the entire range because primes are in general not uniformly distributed over ranges of positive integers.
Primes in a given range are generated with equal probability.
| Out[8]= |  |
|
Random Generation Functions
The main functions are
RandomReal,
RandomInteger,
RandomComplex,
RandomChoice, and
RandomSample.
RandomReal,
RandomInteger, and
RandomComplex generate numbers given some range of numeric values.
RandomChoice and
RandomSample generate elements from finite sets that may include non-numeric values.
Random Numbers
RandomReal generates pseudorandom real numbers over a specified range of real values.
RandomInteger generates pseudorandom integer numbers over a specified range of integer values.
RandomComplex generates pseudorandom complex numbers over a specified rectangular region in the complex plane.
RandomPrime generates prime numbers with equal probability within a range.
| RandomReal[] | give a pseudorandom real number in the range 0 to 1 |
| RandomReal[{xmin,xmax}] | give a pseudorandom real number in the range xmin to xmax |
| RandomReal[xmax] | give a pseudorandom real number in the range 0 to xmax |
| RandomReal[dist] | give a random number from the continuous distribution dist |
| RandomReal[domain,n] | give a list of n pseudorandom reals |
| RandomReal[domain,{n1,n2,...}] | give an n1×n2×... array of pseudorandom reals |
Generation of random reals.
| RandomInteger[{imin,imax}] | give a pseudorandom integer in the range {imin, ..., imax} |
| RandomInteger[imax] | give a pseudorandom integer in the range {0, ..., imax} |
| RandomInteger[] | pseudorandomly give 0 or 1 with probability  |
| RandomInteger[dist] | give a pseudorandom integer from the discrete distribution dist |
| RandomInteger[domain,n] | give a list of n pseudorandom integers |
| RandomInteger[domain,{n1,n2,...}] | give an n1×n2×... array of pseudorandom integers |
Generation of random integers.
| RandomComplex[] | give a pseudorandom complex number in the unit square |
| RandomComplex[{zmin,zmax}] | give a pseudorandom complex number in the rectangle bounded by zmin and zmax |
| RandomComplex[zmax] | give a pseudorandom complex number in the rectangle bounded by 0 and zmax |
| RandomComplex[domain,n] | give a list of n pseudorandom complex numbers |
| RandomComplex[domain,{n1,n2,...}] | give an n1×n2×... array of pseudorandom complex numbers |
Generation of random complex numbers.
| RandomPrime[{imin,imax}] | give a pseudorandom prime in the range {imin, ..., imax} |
| RandomPrime[imax] | give a pseudorandom prime in the range 2 to imax |
| RandomPrime[domain,n] | give a list of n pseudorandom primes |
| RandomPrime[domain,{n1,n2,...}] | give an n1×n2×... array of pseudorandom primes |
Generation of random primes.
When the domain is specified in terms of
xmin and
xmax,
RandomReal and
RandomInteger generate uniformly distributed numbers over the specified range. When the domain is specified as a distribution, rules defined for the distribution are used. Additionally, mechanisms are included for defining
new methods and distributions.
The two-argument interface provides a convenient way to obtain multiple random numbers at once. Even more importantly, there is a significant efficiency advantage to generating a large number of pseudorandom numbers at once.
Generating 107 numbers between 0 and 1 takes a fraction of a second.
| Out[9]= |  |
|
Generating 107 numbers one at a time takes roughly five times as long.
| Out[10]= |  |
|
For multidimensional arrays with dimensions
n1 through
nk, the total number of required pseudorandom numbers
ntotal=
ni is generated and then partitioned. This makes the multidimensional array generation as efficient as possible because the total number of random values is generated as efficiently as possible and the time required for partitioning is negligible.
The time required for a 100×100×100×10 array is about the same as for a vector of 107 numbers.
| Out[11]= |  |
| Out[12]= |  |
|
An array of the same dimensions generated 10 numbers at a time takes several times as long.
| Out[13]= |  |
|
For statistical distributions, the speed advantage of generating many numbers at once can be even greater. In addition to the efficiency benefit inherited from the uniform number generators used, many statistical distributions also benefit from vectorized evaluation of elementary and special functions. For instance,
WeibullDistribution benefits from vector evaluations of the elementary functions
Power,
Times, and
Log.
Generating 105 Weibull numbers takes virtually no time.
| Out[14]= |  |
|
Several seconds are required when 105 Weibulls are generated one at a time.
| Out[15]= |  |
|
Random number generation can be useful in exploratory investigations. For instance, you might look for occurrences of a random sequence of digits in a longer sequence of digits.
This converts a list of 5 random decimal digits to a string.
| Out[16]= |  |
|
The following converts the first million digits of  to a string of integers. |
This gives the positions where the string of five digits appear in the first million digits of  .
| Out[18]= |  |
|
Random number generation is also highly useful in estimating distributions for which closed-form results are not known or known to be computationally difficult. Properties of random matrices provide one example.
This estimates the probability a 5×5 matrix of uniform reals will have real eigenvalues.
| Out[19]= |  |
|
The following does the same for a matrix of standard normal numbers.
| Out[20]= |  |
|
An example of simulating a multivariate distribution is the Gibbs sampler used in Bayesian statistics [
1]. The Gibbs sampler provides a means by which to simulate values from multivariate distributions provided the distributions of each coordinate conditional on the other coordinates are known. Under some restrictions, the distribution of random vectors constructed by iteratively sampling from the conditional distributions will converge to the true multivariate distribution.
The following example will construct a Gibbs sampler for an example given by Casella and George [
2]. The distribution of interest is bivariate. The conditional distribution of
x given
y is a binomial, and the conditional distribution of
y given
x is a beta. As Casella and George mention, various strategies for detecting convergence and sampling using the Gibbs sampler have been suggested. For simplicity, assume that convergence will occur within 1000 iterations. A sample of size
n from the distribution will be taken as the
n values following the 1000
th iteration. It should be noted that these
n values will, however, be dependent.
This defines the sampler with a binomial and a beta conditional distribution. |
A Gibbs sampler could also be defined as a distribution object within the distribution framework for random number generation. An example of this particular
Gibbs sampler as a distribution object is provided in the section "
Defining Distributions".
data is a sample of length 104. |
The following bar chart shows the marginal distribution of the first dimension.
| Out[25]= |  |
|
The marginal distribution of the second coordinate can be visualized with a histogram.
| Out[27]= |  |
|
Conditional distributions should closely match the assumed binomial and beta distributions provided there is enough data for the conditional distribution. The greatest amount of data occurs when the densities of the marginal distributions are highest, so those values can be used for comparisons. The following graphics compare the empirical and assumed conditional distributions, using bins of width .05 for estimating probabilities of continuous values.
This compares the empirical and theoretical distributions of x for 0.3`≤y<0.35`.
| Out[28]= |  |
|
This compares the empirical and theoretical distributions of y for x=1.
| Out[29]= |  |
|
Arbitrary-Precision Reals and Complexes
By default,
RandomReal and
RandomComplex generate machine-precision numbers. Arbitrary-precision numbers can be obtained by setting the
WorkingPrecision option.
Option for RandomReal and RandomComplex.
The option is valid for uniformly distributed reals, complexes, and reals from built-in distributions.
WorkingPrecision can also be incorporated into user-defined distributions.
Here is a precision-25 real number between 5 and 50.
| Out[30]= |  |
|
This gives a precision-50 t-distributed number.
| Out[31]= |  |
|
Increased
WorkingPrecision can be useful in simulations where loss of precision can be expected and highly accurate results are necessary. Increased precision can also be used to estimate the precision loss in computations.
This estimates the worst precision loss in computing J1 on the interval [0, 1000].
| Out[32]= |  |
|
If the precision of the input is less than the specified
WorkingPrecision, the function will warn of the problem. The precision of the input will then be artificially increased to generate a pseudorandom number of the desired precision.
A warning is generated because the machine number 7.5 has precision less than 50.
| Out[33]= |  |
|
WorkingPrecision is not an option for
RandomInteger. Integers have infinite precision, so the precision is completely specified by the function name.
| Out[34]= |  |
|
Random Elements
RandomChoice and
RandomSample generate pseudorandom selections from a list of possible elements. The elements can be numeric or nonnumeric.
| RandomChoice[{e1,e2,...}] | give a pseudorandom choice of one of the ei |
| RandomChoice[list,n] | give a list of n pseudorandom choices from list |
| RandomChoice[list,{n1,n2,...}] | give n1×n2×... pseudorandom choices from list |
| RandomChoice[{w1,w2,...}->{e1,e2,...}] |
| give a pseudorandom choice weighted by the wi |
| RandomChoice[wlist->elist,n] | give a list of n weighted choices |
| RandomChoice[wlist->elist,{n1,n2,...}] |
| give an array of n1×n2×... array of weighted choices |
Random choice from a list.
| RandomSample[{e1,e2,...},n] | give a pseudorandom sample of n of the ei |
| RandomSample[{w1,w2,...}->{e1,e2,...},n] |
| give a pseudorandom sample of n of the ei chosen using weights wi |
| RandomSample[{e1,e2,...}] | give a pseudorandom permutation of the ei |
| RandomSample[wlist->elist] | give a pseudorandom permutation of elist using initial weights wlist |
Random sample from a list.
The main difference between
RandomChoice and
RandomSample is that
RandomChoice selects from the
ei with replacement, while
RandomSample samples without replacement. The number of elements chosen by
RandomChoice is not limited by the number of elements in
elist, and an element
ei may be chosen more than once. The size of a sample returned by
RandomSample is limited by the number of elements in
elist, and the number of occurrences of a distinct element in that sample is limited by the number of occurrences of that element in
elist.
If the first argument to
RandomChoice or
RandomSample is a list, elements are selected with equal probability. The weight specification defines a distribution on the set of the
ei. The weights must be positive, but need not sum to 1. For weights
{w1, ..., wn} the probability of
ei in the initial distribution is
wi/
wj. Since
RandomSample samples without replacement, weights are updated internally based on the total remaining weight after each selection.
RandomChoice can be used for simulation of independent identically distributed events with a finite list of possible outcomes.
This gives 15 simulated fair coin tosses.
| Out[35]= |  |
|
This gives 20 rolls of a die loaded toward 5s.
| Out[36]= |  |
|
RandomChoice can be used to generate observations from any discrete distribution with finite support.
| Out[37]= |  |
|
Here is the empirical PDF for 1000 simulated points.
| Out[40]= |  |
|
RandomSample can be used to simulate observations from a finite set of outcomes in which each element in the list of outcomes can only be observed once. There may be more than one occurrence of distinct values in the list.
This simulates 7 draws from a container of 80 blue and 45 red objects.
| Out[41]= |  |
|
Randomly sampling all elements in the list results in a random permutation.
The following is a random permutation of the integers from 1 to 10.
| Out[42]= |  |
|
Assigning weights to the elements results in a random permutation in which values with greater weight tend to appear earlier in the permutation than values with lesser weight.
Here is a random permutation weighted by the squares of the data values.
| Out[43]= |  |
|
For the same list of weighted or unweighted elements,
RandomSample[#, 1]& is distributionally equivalent to
RandomChoice.
This gives an empirical PDF for 105 random samples of size 1.
| Out[45]= |  |
| Out[46]= |  |
|
Here is an empirical distribution for a distributionally equivalent RandomChoice.
| Out[47]= |  |
|
The probabilities for the two examples are very close to each other and to the theoretical values.
These are the theoretical probabilities.
| Out[48]= |  |
|
RandomSample can also be used for random assignments to groups, such as in clinical trials. The following uses integers, but other identifying values such as name or identification number could be used instead.
The following randomly places 20 elements into four groups of equal size.
| Out[49]= |  |
|
RandomChoice and
RandomSample can be affected by changes to the
Method option to
SeedRandom. Built-in methods are described in
"Methods". Additionally, mechanisms for defining new methods are described in
"Defining Your Own Generator".
Seeding and Localization
Pseudorandom number generators algorithmically create numbers that have some apparent level of randomness. Methods for pseudorandom number generation typically use a recurrence relation to generate a number from the current state and to establish a new state from which the next number will be generated. The state can be set by seeding the generator with an integer that will be used to initialize the recurrence relation in the algorithm.
Given an initial starting point, called a seed, pseudorandom number generators are completely deterministic. In many cases it is desirable to locally or globally set the seed for a random number generator to obtain a constant sequence of "random" values. If set globally, the seed will affect future pseudorandom numbers unless a new seed is explicitly set. If set locally, the seed will only affect random number and element generation within the localized code.
| BlockRandom[expr] | evaluate expr with all pseudorandom generators localized |
| SeedRandom[n] | reset the pseudorandom generator using n as a seed |
| SeedRandom[] | reset the generator using as a seed the time of day and certain attributes of the current Mathematica session |
Localization and seeding functions.
The
SeedRandom function provides a means by which to seed the random generator. Used on its own,
SeedRandom will globally set the seed for random generators. The
BlockRandom function provides a means by which to locally set or change the seed for random generators without affecting the global state.
The following seeds the random generator globally.
| Out[50]= |  |
|
The following gives two different numbers because the first
RandomReal is generated within
BlockRandom, while the second is generated outside of
BlockRandom.
The second RandomReal is not generated using the seed 1.
| Out[51]= |  |
|
SeedRandom also provides the mechanism for switching the random generator.
Option for SeedRandom.
An individual generator can be seeded directly by specifying that generator via the
Method option. All generators can be seeded by setting
Method->All.
Here the default generator is seeded with 1, but the "Rule30CA" generator is not.
| Out[52]= |  |
|
Seeding the "Rule30CA" generator with 1 gives a different random number.
| Out[53]= |  |
|
Methods
Five pseudorandom generator methods are available on all systems. A sixth platform-dependent method is available on Intel-based systems. A framework for defining new methods, described in the section "
Defining Your Own Generator", is also included.
| "Congruential" | linear congruential generator (low-quality randomness) |
| "ExtendedCA" | extended cellular automaton generator (default) |
| "Legacy" | default generators prior to Mathematica 6.0 |
| "MersenneTwister" | Mersenne Twister shift register generator |
| "MKL" | Intel MKL generator (Intel-based systems) |
| "Rule30CA" | Wolfram rule 30 generator |
Built-in methods.
This gives pseudorandom integers from each method with seed 2020.
| Out[54]= |  |
|
This gives pseudorandom reals from the same seed.
| Out[55]= |  |
|
Congruential
"Congruential" uses a linear congruential generator. This is one of the simplest types of pseudorandom number generators, with pseudorandom numbers between 0 and 1 obtained from
xi/m, where
xi is given by the modular recurrence relation
for some fixed integers
b,
c, and
m called the multiplier, increment, and modulus respectively. If the increment is 0, the generator is a multiplicative congruential generator. The values of
b,
c, and
m can be set via options to the
"Congruential" method.
| | |
| "Bits" | Automatic | specify range of bits to use for numbers constructed from bits |
| "Multiplier" | 1283839219676404755 | multiplier value |
| "Increment" | 0 | increment value |
| "Modulus" | 2305843009213693951 | modulus value |
| "ConvertToRealsDirectly" | True | whether reals should be constructed directly from the congruence relation |
Options for Method "Congruential".
Linear congruential generators are periodic and tend to give a lower quality of randomness, especially when a large number of random values is needed. If reals are generated directly from the congruence relation, the period is less than or equal to
m.
The default option values are chosen to have a large period and for 64-bit efficiency. With the default options, the
"Congruential" generator passes many standard tests of randomness despite the inherent issues with congruential number generators.
This generates 40 numbers from a multiplicative congruential generator. |
The period of a multiplicative congruential generator is bounded above by the number of positive integers less than or equal to the modulus that are relatively prime to the modulus. This upper bound is Euler's totient function of the modulus.
With a modulus of 63, the period of the cycle is at most 36.
| Out[57]= |  |
|
The actual period can be determined by finding the smallest integer
i such that
i
bi mod
m.
The period with multiplier 11 and modulus 63 is 6.
| Out[58]= |  |
|
Partitioning the data into sets of 6 elements shows the recursion.
| Out[59]= |  |
|
The distinct numbers can also be seen graphically by plotting a sequence of generated numbers.
Here is a plot of 1000 values from the congruential generator.
| Out[60]= |  |
|
If
"ConvertToRealsDirectly" is set to
False, reals are generated by taking eight bits at a time from elements of the sequence to construct a 52-bit machine-precision number. Congruential numbers generated in this fashion will still cycle, but cycling will depend on repetition in the bit pattern rather than in the initial congruence relation.
The
"Bits" option can be
Automatic, a nonzero integer, or a list of two nonzero integers specifying the range of bits in the modulus
m used for constructing numbers from bits.
Automatic uses
{2, -1} unless
m is a power of 2, in which case
{1, -1} is used.
ExtendedCA
The default
"ExtendedCA" method makes use of cellular automata to generate high-quality pseudorandom numbers. This generator uses a particular five-neighbor rule, so each new cell depends on five nonadjacent cells from the previous step.
Cellular-automata-based random number generators evolve a state vector of 0s and 1s according to a deterministic rule. For a given cellular automaton, an element (or cell) at a given position in the new state vector is determined by certain neighboring cells of that cell in the old state vector. A subset of cells in the state vectors are then output as random bits from which the pseudorandom numbers are generated.
The cellular automaton used by
"ExtendedCA" produces an extremely high level of randomness. It is so high that even using every single cell in output will give a stream of bits that passes many randomness tests, in spite of the obvious correlation between one cell and five previous ones.
Two options are included for modifying the size of the state vector and the cells skipped. The defaults are chosen for quality and speed and there is typically no need to modify these options.
| | |
| "Size" | 80 | state vector size as a multiplier of 64 |
| "Skip" | 4 | number of cells to skip |
Options for Method "ExtendedCA".
The length of the state vectors used is by default set to 80×64=5120 cells. The multiple of 64 can be controlled by the
"Size" option.
In practice using every fourth cell in each state vector proves to be sufficient to pass very stringent randomness tests. This is the default used for the
"Skip" option. For even faster random number generation, a
"Skip" setting of 2 or even 1 could be used, but the quality of the random numbers will then decline.
"ExtendedCA" is the default number generator.
| Out[61]= |  |
| Out[62]= |  |
|
Legacy
The
"Legacy" method uses the generator called by
Random in versions of
Mathematica prior to Version 6.0. A Marsaglia-Zaman subtract-with-borrow generator is used for reals. The integer generator is based on a Wolfram rule 30 cellular automaton generator. The rule 30 generator is used directly for small integers and used to generate certain bits for large integers.
| Out[63]= |  |
|
The same values are given by equivalent Random calls.
| Out[64]= |  |
|
To guarantee consistency with sequences generated prior to Version 6.0, seeds set for the
Automatic method are also applied to the
"Legacy" method.
The
"Legacy" method has no options.
MersenneTwister
"MersenneTwister" uses the Mersenne Twister generator due to Matsumoto and Nishimura [
3][
4]. The Mersenne Twister is a generalized feedback shift register generator with period
219937-1.
This gives 5 random numbers from a Mersenne Twister generator.
| Out[65]= |  |
|
The
"MersenneTwister" method has no options.
MKL
The
"MKL" method uses the random number generators provided in Intel's MKL libraries. The MKL libraries are platform dependent. The
"MKL" method is only available on Intel-based systems.
Option for Method "MKL".
| "MCG31" | 31-bit multiplicative congruential generator |
| "MCG59" | 59-bit multiplicative congruential generator |
| "MRG32K3A" | combined multiple recursive generators with two components of order 3 |
| "MersenneTwister" | Mersenne Twister shift register generator |
| "R250" | generalized feedback shift register generator |
| "WH" | Wichmann-Hill combined multiplicative congruential generators |
| "Niederreiter" | Niederreiter low-discrepancy sequence |
| "Sobol" | Sobol low-discrepancy sequence |
"MKL" methods.
The first six methods are uniform generators.
"Niederreiter" and
"Sobol" generate Niederreiter and Sobol sequences. These sequences are nonuniform and have underlying structure which is sometimes useful in numerical methods. For instance, these sequences typically provide faster convergence in multidimensional Monte Carlo integration.
The following shows the structure of a Niederreiter sequence in dimension 2.
| Out[66]= |  |
|
This shows the structure of a Sobol sequence in dimension 2. |