Applications—Wolfram Documentation

Wolfram Language & System Documentation Center

Applications

Image Processing	Random Number Generation
Linear Algebra and List Processing	Code Generation

Because GPUs are SIMD machines, to exploit CUDA's potential you must pose the problem in an SIMD manner. Computation that can be partitioned in such a way that each thread can compute one element independently is ideal for the GPU.

Some algorithms either cannot be written in parallel, or cannot be used on CUDA (due to architecture constraints). In those cases, research is ongoing to introduce alternative methods to use the GPU to perform those computations.

In this section, some usage of CUDA programming inside the Wolfram Language is showcased. All the following examples use CUDAFunctionLoad, which allows you to load CUDA source, binaries, or libraries into the Wolfram Language.

CUDAFunctionLoad

load CUDA function into the Wolfram Language

CUDAFunctionLoad allows you to load CUDA source, binaries, or libraries into the Wolfram Language.

Image Processing

This section contains examples of CUDA applications that perform image processing operations. CUDALink contains a few built-in functions that perform image processing operations, such as CUDAImageConvolve, CUDABoxFilter, CUDAErosion, CUDADilation, CUDAOpening, CUDAClosing, CUDAImageAdd, etc.

Image Binarize

Binarize takes an input image and outputs a binary image with pixels set to white if above a threshold and black otherwise.

If you have not done so already, import the CUDALink application.

Wolfram Language code: Needs["CUDALink`"]

This defines the input image. To reduce the memory footprint on the GPU, use to represent the image.

Wolfram Language code:

binarizeImageCode = "__global__ void binarize(unsigned char * in, unsigned char * out, mint threshold, mint width, mint height, mint channels) {
   	mint xIndex = threadIdx.x + blockIdx.x*blockDim.x;
   	mint yIndex = threadIdx.y + blockIdx.y*blockDim.y;
   	mint index = channels*(xIndex + yIndex*width);
   	if (xIndex < width && yIndex < height) {
  		 mint accum = 0;
   		for (mint ii = 0; ii < channels; ii++)
   			accum += in[index+ii];
  		 out[xIndex + yIndex*width] = accum > channels*threshold ? 255 : 0;
   	}
}";

This loads CUDAFunction. The type is signified by "UnsignedByte" in the parameter list.

Wolfram Language code:

binarizeImage = CUDAFunctionLoad[binarizeImageCode, "binarize", {{"UnsignedByte", "Input"}, {"UnsignedByte", "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}]

This defines the input image and allocates "UnsignedByte" memory for the output.

Wolfram Language code:

input = [image];
{height, width} = ImageDimensions[input];
channels = ImageChannels[input];
output = CUDAMemoryAllocate["UnsignedByte", {width, height}];

This calls the binarize function, using 150 as the threshold value.

Wolfram Language code: binarizeImage[input, output, 150, width, height, channels, {width, height}]

This displays the output image.

Wolfram Language code: Image[output, ImageSize -> Medium]

The result agrees with the Wolfram Language.

Wolfram Language code: Binarize[[image]]

This unloads the memory allocated.

Wolfram Language code: CUDAMemoryUnload[output]

Box Filter

The box filter is an optimized convolution when the kernel is a BoxMatrix. It is implemented here.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "boxFilter.cu"}]

This loads the CUDA functions needed to perform the box filter.

Wolfram Language code:

boxFilterHorizontal = CUDAFunctionLoad[{srcf}, "d_boxfilter_rgba_x", {{"UnsignedByte", "Input"}, {"UnsignedByte", "Output"}, _Integer, _Integer, _Integer}, {16, 1}, "IncludeDirectories" -> FileNameJoin[{$CUDALinkPath, "SupportFiles"}], "UnmangleCode" -> False];
boxFilterVertical = CUDAFunctionLoad[{srcf}, "d_boxfilter_rgba_y", {{"UnsignedByte", "Input"}, {"UnsignedByte", "Output"}, _Integer, _Integer, _Integer}, {16, 1}, "IncludeDirectories" -> FileNameJoin[{$CUDALinkPath, "SupportFiles"}], "UnmangleCode" -> False];

This sets the input parameters.

Wolfram Language code:

input = CUDAMemoryLoad[[image], "UnsignedByte"];
tmp = CUDAMemoryAllocate["UnsignedByte", {512, 512, 4}];
output = CUDAMemoryAllocate["UnsignedByte", {512, 512, 4}];
width = height = 512;

The radius is set to 5.

Wolfram Language code: radius = 5;

This calls the functions.

Wolfram Language code:

boxFilterHorizontal[input, tmp, width, height, radius];
boxFilterVertical[tmp, output, width, height, radius];

This gets the image.

Wolfram Language code: Image[CUDAMemoryGet[tmp], "Byte", ImageSize -> Small]

This unloads the memory allocated.

Wolfram Language code: CUDAMemoryUnload[input, output, tmp]

Image Adjust

This is an implementation of ImageAdjust in CUDA.

Wolfram Language code:

src = "
__device__ mint xclamp(mint val, mint low, mint high) {
	return val <= low ? low : (val >= high ? high : val);
}

__device__ mint adjust(mint pixel, float lowIn, float highIn, float lowOut, float highOut, float gamma) {

	float res, val;
	val = xclamp(pixel, lowIn, highIn);
	
	res = pow((val - lowIn) / (highIn - lowIn), gamma);
	res = res * (highOut - lowOut) - lowOut;
	
	return res + 0.5f;
}

__global__ void imageAdjust(mint * img, mint width, mint height, mint channels, float lowIn, float highIn, float lowOut, float highOut, float gamma) {
	
	int xIndex = threadIdx.x + blockIdx.x * blockDim.x;
	int yIndex = threadIdx.y + blockIdx.y * blockDim.y;
	if (xIndex >= width || yIndex >= height)
		return ;
	int pos = channels * (yIndex * width + xIndex);
	for (mint ii = 0; ii < channels; ii++) {
		img[pos + ii] = adjust(img[pos + ii], 255*lowIn, 255*highIn, 255*lowOut, 255*highOut, gamma);
	}
}";

CUDAFunction is loaded from the source string, and a float is used for the constant values.

Wolfram Language code:

cCUDAImageAdjust = CUDAFunctionLoad[src, "imageAdjust", {{_Integer}, _Integer, _Integer, _Integer, "Float", "Float", "Float", "Float", "Float"}, {16, 16}]

This wraps the CUDAFunction to make CUDAImageAdjust with similar syntax to ImageAdjust.

Wolfram Language code:

CUDAImageAdjust[img_Image, {lowIn_Real, highIn_Real}, gamma_ : 1.0] /; Head[gamma] == Real := CUDAImageAdjust[img, {lowIn, highIn}, {0.0, 1.0}, gamma]
CUDAImageAdjust[img_Image, {lowIn_Real, highIn_Real}, {lowOut_Real, highOut_Real}, gamma_ : 1.0] := 
	Module[{width, height, channels}, 
	{height, width, channels} = Flatten[{ImageDimensions[img], ImageChannels[img]}];
	cCUDAImageAdjust[img, width, height, channels, lowIn, highIn, lowOut, highOut, gamma, {width, height}]//First
	]

The function can be used similarly to ImageAdjust.

Wolfram Language code: CUDAImageAdjust[[image], {0.3, 0.8}]

Canny Edge Detection

Canny edge detection combines a dozen or so filters to find edges in an image. The Wolfram Language's EdgeDetect provides similar functionality. Here is the implementation.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "canny.cu"}]

This loads the CUDA functions needed to perform the edge detection.

Wolfram Language code:

gaussianVertical = CUDAFunctionLoad[{srcf}, "gaussianVert_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
gaussianHorizontal = CUDAFunctionLoad[{srcf}, "gaussianHoriz_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
sobelGXVertical = CUDAFunctionLoad[{srcf}, "sobelGxVert_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
sobelGXHorizontal = CUDAFunctionLoad[{srcf}, "sobelGxHoriz_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
sobelGYVertical = CUDAFunctionLoad[{srcf}, "sobelGyVert_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
sobelGYHorizontal = CUDAFunctionLoad[{srcf}, "sobelGyHoriz_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
magnitude = CUDAFunctionLoad[{srcf}, "magnitude_kernel", {{_Integer, "Input"}, {_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
atan = CUDAFunctionLoad[{srcf}, "atan_kernel", {{_Integer, "Input"}, {_Integer, "Input"}, {"Float", "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
zeroCrossing = CUDAFunctionLoad[{srcf}, "zeroCrossing_kernel", {{"Float", "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
nonMaximalSupression = CUDAFunctionLoad[{srcf}, "nonMaximalSuppression_kernel", {{_Integer, "Input"}, {_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
hysteresis = CUDAFunctionLoad[{srcf}, "hysteresis_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];
binarize = CUDAFunctionLoad[{srcf}, "binarize_kernel", {{_Integer, "Input"}, {_Integer, "Output"}, _Integer, _Integer, _Integer, _Integer}, {16, 16}];

This sets the input image along with its parameters.

Wolfram Language code:

img = [image];
{{width, height}, channels} = {ImageDimensions[img], ImageChannels[img]};
pitch = width * channels;

Now add host and device tensors to the CUDA manager. These will hold both the input and the output.

Wolfram Language code:

inhost = CUDAMemoryLoad[img, Integer];
outhost = CUDAMemoryAllocate[Integer, {width, height, channels}];

Next, define device-only temporary memory that will be used in the computation. Since Canny edge detection involves many filters, you need quite a few of them.

Wolfram Language code:

tmpdev = CUDAMemoryAllocate[Integer, {width, height, channels}];
gxdev = CUDAMemoryAllocate[Integer, {width, height, channels}];
gydev = CUDAMemoryAllocate[Integer, {width, height, channels}];
cdev = CUDAMemoryAllocate[Integer, {width, height, channels}];
magdev = CUDAMemoryAllocate[Integer, {width, height, channels}];
tandev = CUDAMemoryAllocate["Float", {width, height, channels}];

This calls the Canny edge-detection functions.

Wolfram Language code:

gaussianVertical[inhost, tmpdev, width, height, channels, pitch];
gaussianHorizontal[tmpdev, outhost, width, height, channels, pitch];
sobelGXHorizontal[outhost, tmpdev, width, height, channels, pitch];
sobelGXVertical[tmpdev, gxdev, width, height, channels, pitch];
sobelGYHorizontal[outhost, tmpdev, width, height, channels, pitch];
sobelGYVertical[tmpdev, gydev, width, height, channels, pitch];
magnitude[gxdev, gydev, magdev, width, height, channels, pitch];
atan[gxdev, gydev, tandev, width, height, channels, pitch];
zeroCrossing[tandev, cdev, width, height, channels, pitch];
nonMaximalSupression[magdev, cdev, tmpdev, width, height, channels, pitch];
hysteresis[tmpdev, outhost, width, height, channels, pitch];
binarize[outhost, outhost, width, height, channels, pitch];

This views the output as an image.

Wolfram Language code: Image[outhost]

This unloads the memory allocated.

Wolfram Language code: CUDAMemoryUnload[inhost, outhost, tmpdev, gxdev, gydev, cdev, magdev, tandev]

Linear Algebra and List Processing

This section contains examples of CUDA applications that perform linear algebra operations. Most of the functions discussed can be performed using CUDATranspose or CUDADot.

Matrix Transpose

Matrix transposition is essential in many algorithms. CUDALink provides a ready-made implementation in the form of CUDATranspose. Users may wish to implement their own, however.

This loads the CUDAFunction for matrix transposition and defines a new function newCUDATranspose that takes a real-valued matrix and outputs its transpose.

Wolfram Language code:

newCUDATranspose[matrix_] := Module[
	{dInputMatrix, dOutputMatrix, hOutputMatrix, blockDim, transposeFun}, 
	
	blockDim = {16, 16};
	
	dInputMatrix = CUDAMemoryLoad[matrix];
	dOutputMatrix = CUDAMemoryAllocate[Real, Reverse@ Dimensions[matrix]];
	
	transposeFun = CUDAFunctionLoad[{FileNameJoin[{$CUDALinkPath, "SupportFiles", "transpose.cu"}]}, "transpose_kernel", {{_Real, "Input"}, {_Real, "Output"}, "Integer32", "Integer32"}, blockDim];
	
	transposeFun[dInputMatrix, dOutputMatrix, Sequence@@Reverse[Dimensions[matrix]]];
	
	hOutputMatrix = CUDAMemoryGet[dOutputMatrix];
	
	CUDAMemoryUnload[dInputMatrix];
	CUDAMemoryUnload[dOutputMatrix];
	
	Return[hOutputMatrix];
	];

This sets a matrix.

Wolfram Language code: MatrixForm[A = Table[1.0 * i - j, {i, 0, 3}, {j, 0, 8}]]

This transposes matrix A.

Wolfram Language code: MatrixForm[newCUDATranspose[A]]

The result agrees with the Wolfram Language.

Wolfram Language code: MatrixForm[Transpose[A]]

Matrix-Vector Multiplication

Matrix-vector multiplication is a common operation in linear algebra, finite element analysis, etc. This loads CUDAFunction, implementing matrix-vector multiplication.

Wolfram Language code:

CUDAMatrixVectorMultiply[matrix_, vector_] := Module[
	{pmv, blockDim, gridDim, dInputMatrix, dInputVector, dOutputVector, hOutputVector}, 
	
	blockDim = 16;
	gridDim = First@Dimensions[matrix];
	
	dInputMatrix = CUDAMemoryLoad[matrix];
	dInputVector = CUDAMemoryLoad[vector];
	dOutputVector = CUDAMemoryAllocate[Integer, Length@vector];
	
	pmv = CUDAFunctionLoad[{FileNameJoin[{$CUDALinkPath, "SupportFiles", "matrixVectorMul.cu"}]}, "matrixVecMul_kernel", {{_Integer, _, "InputOutput"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, _Integer, _Integer}, blockDim];
	
	pmv[dOutputVector, dInputMatrix, dInputVector, Sequence@@Reverse[Dimensions[matrix]], gridDim];
	
	hOutputVector = CUDAMemoryGet[dOutputVector];
	
	CUDAMemoryUnload /@ {dOutputVector, dInputMatrix, dInputVector};
	
	Return[hOutputVector];
	];

This sets the input matrix and vector.

Wolfram Language code:

A = Table[i + j, {i, 0, 15}, {j, 0, 15}];
B   = Table[i, {i, 0, 15}];

This invokes the above defined function, displaying the result using MatrixForm.

Wolfram Language code: MatrixForm[CUDAMatrixVectorMultiply[A, B]]

The result agrees with the Wolfram Language.

Wolfram Language code: MatrixForm[A.B]

Matrix-Matrix Multiplication

Matrix-matrix multiplication is a pivotal function in many algorithms. This loads CUDAFunction from a source file, setting the block dimension to 4.

Wolfram Language code:

blockSize = 4;
MatrixMultiply = CUDAFunctionLoad[{FileNameJoin[{$CUDALinkPath, "SupportFiles", "matrixMul.cu"}]}, "matrixMul", {{_Real, 2, "Output"}, {_Real, 2, "Input"}, {_Real, 2, "Input"}, _Integer, _Integer}, {blockSize, blockSize}, "Defines" -> {"BLOCK_SIZE" -> blockSize}];

This sets input values and allocates memory for the output.

Wolfram Language code:

A = RandomReal[1.0, {8, 8}];
B = RandomReal[1.0, {8, 8}];
out = CUDAMemoryAllocate[Real, {8, 8}];

This performs the computation.

Wolfram Language code: MatrixMultiply[out, A, B, Sequence@@Dimensions[A]];

This displays the result using MatrixForm.

Wolfram Language code: CUDAMemoryGet[out]//MatrixForm

The result agrees with the Wolfram Language.

Wolfram Language code: Dot[A, B]//MatrixForm

This unloads the output buffer.

Wolfram Language code: CUDAMemoryUnload[out]

Dot Product

The dot product of two vectors is a common operation in linear algebra. This implements a function that takes a set of vectors and gives the dot product of each.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "dotProduct.cu"}]

This loads the function into the Wolfram Language.

Wolfram Language code:

dotProduct = CUDAFunctionLoad[{srcf}, "scalarProdGPU", {{_Real, "Output"}, {_Real, "Input"}, {_Real, "Input"}, _Integer, _Integer}, 1024]

Generate 50 random vectors.

Wolfram Language code:

A = RandomReal[1.0, {50, 50}];
B = RandomReal[1.0, {50, 50}];
out = ConstantArray[0.0, First[Dimensions[A]]];

This calls the function, returning the result.

Wolfram Language code: dotProduct[out, A, B, Sequence@@Dimensions[A], Length[out]]//First

The results agree with the Wolfram Language's output.

Wolfram Language code: MapThread[Dot, {A, B}]

Convex Hull

Convex hull is hard to make parallel on the GPU, so in this example the hybrid approach to GPU programming is taken: do computation that makes sense on the GPU, and the rest is done on the CPU.

This convex hull implementation is a textbook implementation of Andrew's algorithm and is designed to be simple, not efficient.

Define the LeftOf predicate in the Wolfram Language.

Wolfram Language code:

pdirection[pa_List, pb_List, px_List] := ((pa[[1]] - px[[1]]) * (pb[[2]] - px[[2]])) - ((pb[[1]] - px[[1]]) * (pa[[2]] - px[[2]]))

This defines a function that splits the point set to be either above or below a given line connecting the extreme points.

Wolfram Language code:

code = "
__global__ void partitionPts(float *inpts, mint *idx, mint length) {
	int index = threadIdx.x + blockIdx.x*blockDim.x;
	int length1 = length-1;
	int index2, tindex;
	Real_t  leftx, lefty;
	Real_t  rightx, righty;
	Real_t  xx, yy, det;

	if(index == 0 || index == length1) {
		idx[index] = -1;
		return;
	}
	if(index >= length)
		return;

	tindex = 2*length1;
	index2 = 2*index;
	leftx = inpts[0];
	lefty = inpts[1];
	rightx = inpts[tindex];
	righty = inpts[tindex+1];

	xx = inpts[index2];
	yy = inpts[index2+1];

	det = ((leftx - xx)*(righty - yy)) - ((rightx - xx)*(lefty - yy));

	if(det >=0)
		idx[index] = 1;
	else
		idx[index] = 0;
}";

This loads the above function as a CUDA function.

Wolfram Language code: partitionPts = CUDAFunctionLoad[code, "partitionPts", {{"Float[2]", _, "Input"}, {_Integer}, _Integer}, 32];

Define a Wolfram Language function that takes two split point sets and finds the convex hull for them.

Wolfram Language code:

makeHalfHull[lcoord_List, ppart_List, factor_] := 
	Module[{hull, length, ii, ihull, idx, pa, pb, px, ptop}, 
		length = Length[ppart];
		hull = ConstantArray[0, length];
		hull[[1]] = 1;
	
		ii = 2; ihull = 1;
		While[ii ≤ length, 
			ihull++;
			hull[[ihull]] = ppart[[ii]];
			While[ihull ≥ 3, 
				idx = {hull[[ihull - 2]], hull[[ihull]], hull[[ihull - 1]]};
				{pa, pb, px} = Part[lcoord, idx];
				If[factor * pdirection[pa, pb, px] < 0.0, 
					hull[[ihull - 1]] = hull[[ihull]];
					hull[[ihull]] = 0;
					ihull--, 
					Break[]
				]
			];
			ii++
		];
		ptop = Position[hull, 0];
		If[ptop  == {}, Return[hull]];
		Take[hull, First[ptop][[1]] - 1]
	]

This calls the above function. Note that the list is sorted with CUDASort before being processed by partitionPts.

Wolfram Language code:

CUDAConvexHull[pts_List] := 
	Module[{mem, slst, ilst, imem, ilength, ppts, upts, lpts, lhull, uhull, ihull}, 
		ilength = Length[pts];
	
		(* CUDA Sort points *)
		mem = CUDAMemoryLoad[pts, "Float[2]"];
		CUDASort[mem];
		slst = CUDAMemoryGet[mem];
	
		(* CUDA Classify points as bottom/top, upts and lpts *)
		ilst = ConstantArray[0, ilength];
		imem = CUDAMemoryLoad[ilst, Integer];
		partitionPts[mem, imem, ilength, ilength];
		ppts = CUDAMemoryGet[imem];
		ppts[[1]] = 1;ppts[[ilength]] = 1;
		upts = Flatten[Position[ppts, 1]];
		ppts[[1]] = 0;ppts[[ilength]] = 0;
		lpts = Flatten[Position[ppts, 0]];
	
		(* Get lower and upper half hulls *)
		lhull  = makeHalfHull[slst, lpts, -1.0];
		uhull  = makeHalfHull[slst, upts, 1.0];
		
		(* Construct Final ConvexHull *)
		If[Length[lhull] == 0, 
			ihull = uhull, 
			If[Length[uhull] == 0, 
			ihull = lhull, 
			ihull = Join[lhull, Reverse[Drop[uhull, -1]]]
		]
		];
	
		(* Delete CUDA Memory *)
		CUDAMemoryUnload[mem, imem];
	
		(* Get result *)
		Part[slst, ihull]
		]

To test, create 20,000 uniformly distributed random points.

Wolfram Language code:

npts = 20000;
lst = RandomReal[10, {npts, 2}];

This computes the hull.

Wolfram Language code: hullpts = CUDAConvexHull[lst];

This visualizes the result. Lines are drawn between the hull points.

Wolfram Language code: Graphics[{Green, Point[lst], Magenta, Line[hullpts]}]

The above algorithm handles the extreme case where all or most points lie on the hull. This generates uniformly distributed points on the unit disk.

Wolfram Language code:

npts = 500;
lst = Table[pt = RandomReal[{-10, 10}, 2];pt / Norm[pt], {i, npts}];

This computes the hull points.

Wolfram Language code: hullpts = CUDAConvexHull[lst];

This visualizes the hull.

Wolfram Language code: Graphics[{Green, Point[lst], Magenta, Line[hullpts]}]

The above is a prime example of combining Wolfram Language and CUDA programming to make an algorithm partially parallel that would otherwise be written in only serial code.

Random Number Generation

Many algorithms, ranging from ray tracing to PDE solving, require random numbers as input. This is, however, a difficult problem on many core systems, where each core has the same state as any other core. To avoid that, parallel random number generators usually use entropy values such as the time of day to seed the random number generators, but those calls are not available in CUDA.

The following section gives three classes of algorithms to generate random numbers. The first is uniform random number generators (where pseudorandom number generators and quasi-random number generators are showcased). The second is random number generators that exploit the uniform distribution of a hashing function to generate random numbers. The final is normal random number generators.

Pseudorandom Number Generators

Pseudorandom number generators are deterministic algorithms that generate numbers that appear to be random. They rely on the seed value to generate further random numbers.

In this section, a simple linear congruential random number generator (the Park–Miller algorithm) is shown, along with a more complex Mersenne Twister.

Park–Miller

The Park–Miller random number generator is defined by the following recurrence equation:

It can be implemented easily in the Wolfram Language.

Wolfram Language code: ParkMiller[x_, a_ : 16807, n_ : 2147483647] := Mod[x * a, n]

Here, common values for and are used. Using NestList, you can generate a list of 1000 numbers and plot them.

Wolfram Language code: ListPlot[NestList[ParkMiller, 1, 1000]]

Here is the timing to generate 10 million numbers.

Wolfram Language code: AbsoluteTiming[NestList[ParkMiller, 1, 10000000];]

An alternative is to use the Method option in SeedRandom; this can be used in the same manner as before.

Wolfram Language code:

ListPlot[BlockRandom[SeedRandom[1, Method -> {"Congruential", "Multiplier" -> 16807, "Increment" -> 0, "Modulus" -> 2147483647}];RandomInteger[2 ^ 31 - 1, 1000]]]

Compared to the Wolfram Language implementation, this is around 300 times faster.

Wolfram Language code:

AbsoluteTiming[BlockRandom[SeedRandom[1, Method -> {"Congruential", "Multiplier" -> 16807, "Increment" -> 0, "Modulus" -> 2147483647}];RandomInteger[2 ^ 31 - 1, 10000000]];]

The CUDA implementation is similar to the one written in the Wolfram Language. The implementation is distributed along with CUDALink, and the location is shown below.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}]

This loads the CUDA function into the Wolfram Language.

Wolfram Language code:

CUDAParkMiller = CUDAFunctionLoad[{srcf}, "ParkMiller", {{_Integer, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 256]

This allocates the output memory.

Wolfram Language code: mem = CUDAMemoryAllocate[Integer, {2048}]

This calls CUDAFunction.

Wolfram Language code: CUDAParkMiller[RandomInteger[Developer`$MaxMachineInteger, 256], mem, 2048]

The result is random.

Wolfram Language code: ListPlot[CUDAMemoryGet[mem]]

If you measure the timing, you notice that it is twice as fast as the Wolfram Language's built-in method, and 600 times faster than pure Wolfram Language implementation.

Wolfram Language code: mem = CUDAMemoryAllocate[Integer, {10000000}]

Wolfram Language code: AbsoluteTiming[CUDAParkMiller[RandomInteger[Developer`$MaxMachineInteger, 256], mem, 10000000];]

The timing against a Compile is similar to that of CUDA. This generates C code from a Compile statement with no integer overflow detection.

Wolfram Language code:

CompiledParkMiller = Compile[{{n, _Integer}}, NestList[# * 16807&, 1, n], RuntimeOptions -> {"CatchMachineIntegerOverflow" -> False}, CompilationTarget -> "C"];

This finds the timing. Notice that there is little difference.

Wolfram Language code: AbsoluteTiming[CompiledParkMiller[50000000];]

If you ignore the time it takes for memory allocation (and you can rightly do so in this case, since generated random numbers are usually reused on the GPU), you notice a 10× speed improvement.

Wolfram Language code:

mem = CUDAMemoryAllocate[Integer, {10000000}];
CUDAMemoryCopyToDevice[mem];
AbsoluteTiming[CUDAParkMiller[RandomInteger[Developer`$MaxMachineInteger, 512], mem, 10000000];]

Mersenne Twister

Mersenne Twister utilizes shift registers to generate random numbers. Because the implementation is simple, it maps well to the GPU. The following file contains the implementation.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}]

This loads the "MersenneTwister" function from the file.

Wolfram Language code:

mersenneTwister = CUDAFunctionLoad[{srcf}, "MersenneTwister", {{_Real, _, "Output"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, {_Integer, _, "Input"}, _Integer}, 32];

Here, the Mersenne Twister input parameters are defined. The twister requires seed values. Those values can be computed offline and stored in a file, or they can be generated by the Wolfram Language. The latter is shown here.

Wolfram Language code:

MTRNGCount = 4096;
PATHN = 2 ^ 14;
NPerRNG = Ceiling[PATHN / MTRNGCount];
NPerRNG = If[EvenQ[NPerRNG], NPerRNG, NPerRNG + 1];
RANDN = MTRNGCount * NPerRNG;
{hsMatrixA, hsMaskB, hsMaskC} = RandomInteger[{-Developer`$MaxMachineInteger, Developer`$MaxMachineInteger}, {3, MTRNGCount}];
hsSeed = RandomInteger[{-Developer`$MaxMachineInteger, Developer`$MaxMachineInteger}, MTRNGCount];

This allocates the output memory. Since the output will be overwritten, there is no need to load memory from the Wolfram Language onto the GPU.

Wolfram Language code: output = CUDAMemoryAllocate[Real, RANDN]

This invokes CUDAFunction with parameters.

Wolfram Language code: mersenneTwister[output, hsMatrixA, hsMaskB, hsMaskC, hsSeed, NPerRNG, MTRNGCount]

The output can be plotted to show it is random.

Wolfram Language code: ListPlot[CUDAMemoryGet[output]]

A Wolfram Language function can be written that takes the number of random numbers to be generated as input, performs the required allocations and setting of parameters, and returns the random output memory.

Wolfram Language code:

MersenneTwister[n_] := Module[{MTRNGCount, PATHN, NPerRNG, RANDN, hsMatrixA, hsMaskB, hsMaskC, hsSeed, output}, 
	MTRNGCount = 4096;
	PATHN = n;
	NPerRNG = Ceiling[PATHN / MTRNGCount];
	NPerRNG = If[EvenQ[NPerRNG], NPerRNG, NPerRNG + 1];
	RANDN = MTRNGCount * NPerRNG;
	{hsMatrixA, hsMaskB, hsMaskC} = RandomInteger[{-Developer`$MaxMachineInteger, Developer`$MaxMachineInteger}, {3, MTRNGCount}];
	hsSeed = RandomInteger[{-Developer`$MaxMachineInteger, Developer`$MaxMachineInteger}, MTRNGCount];
	output = CUDAMemoryAllocate[Real, RANDN];
	First@mersenneTwister[output, hsMatrixA, hsMaskB, hsMaskC, hsSeed, NPerRNG, 128]
	]

This generates a plot of the first 10,000 elements.

Wolfram Language code: ListPlot[CUDAMemoryGet[MersenneTwister[10000]][[ ;; 10000]]]

The following measures the time it takes for the random number generator to generate 100 million numbers.

Wolfram Language code: MersenneTwister[100000000];//AbsoluteTiming

This is on par with the Wolfram Language's random number generator timings.

Wolfram Language code: BlockRandom[SeedRandom[1, Method -> "MersenneTwister"];RandomReal[1, 100000000]];//AbsoluteTiming

Considering that random numbers are seeds to other problems, a user may get performance increase in the overall algorithm even if the Wolfram Language's timings are superior to the CUDA implementation.

Quasi-Random Number Generators

This section describes quasi-random number generators. Unlike pseudorandom number generators, these sequences are nonuniform and have underlying structures that are sometimes useful in numerical methods. For instance, these sequences typically provide faster convergence in multidimensional Monte Carlo integration.

Halton Sequence

The Halton sequence generates quasi-random numbers that are uniform on the unit interval. While the code works with arbitrary dimensions, only the van der Corput sequence is discussed, which works on 1D space. This is adequate for comparison.

The resulting numbers of the Halton (or van der Corput) sequence are deterministic but have low discrepancy over the unit interval. Because they fill the space uniformly in some applications, such as Monte Carlo integration, they are preferred to pseudorandom number generators.

For a given with base :

The one-dimensional Halton (or van der Corput) value in base :

The sequence of length is then written as:

Given a number in base representation , the van der Corput sequence mirrors the number across the decimal point, so that its sequence value is .

In the Wolfram Language, you can find the sequence using IntegerDigits.

Wolfram Language code:

VanDerCorput[base_][len_] := Table[
	With[{digits = Reverse@IntegerDigits[n, base]}, 
	Sum[2 ^ (-ii) * digits[[ii]], {ii, Length[digits]}]
	], {n, len}]

Setting the base to 2, you can calculate the first 1000 elements in the sequence.

Wolfram Language code: x = VanDerCorput[2][1000];

This plots the result; notice how it fills the space uniformly.

Wolfram Language code: ListPlot[%]

A property of low-discrepancy sequences is that the next elements in the sequence know where the previous elements are positioned. This can be shown with Manipulate.

Wolfram Language code:

Manipulate[
	ListPlot[VanDerCorput[2][n]], {n, 10, 10000, 10}]

For the CUDA implementation, you have to implement your own version of IntegerDigits, but this is not difficult. First, load the implementation source code.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}]

This loads CUDAFunction.

Wolfram Language code: CUDAHaltonSequence = CUDAFunctionLoad[{srcf}, "Halton", {{_Real, _, "Output"}, _Integer, _Integer}, 256]

This allocates memory for the output. Here, only 1024 random numbers are generated.

Wolfram Language code: mem = CUDAMemoryAllocate[Real, {1024}]

This runs the function for dimension 1.

Wolfram Language code: CUDAHaltonSequence[mem, 1, 1024]

This plots the results.

Wolfram Language code: ListPlot[CUDAMemoryGet[mem]]

Sobol Sequence

The Sobol sequence is also a low-discrepancy sequence. It is implemented in the following CUDA file.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}]

This loads the function, using {64,1} as the block dimension.

Wolfram Language code:

sobolFun = CUDAFunctionLoad[{srcf}, "Sobol", {_Integer, _Integer, {_Integer, _, "Input"}, {_Real, _, "Output"}}, {64, 1}]

Here, the input parameters are loaded. The direction vectors needed by the Sobol sequence are precomputed and stored in a file.

Wolfram Language code:

vectorCount = 100000;
dimension = 100;
directions = CUDAMemoryLoad[Flatten[Import[FileNameJoin[{$CUDALinkExampleDataPath, "soboldirection.txt"}], "Data"]]];
output = CUDAMemoryAllocate[Real, vectorCount * dimension];
gridDim = {64, dimension};

This executes the Sobol function, passing parameters.

Wolfram Language code: sobolFun[vectorCount, dimension, directions, output, gridDim];

This plots the first 10,000 values in the sequences. Note that the space is filled evenly with points (a property of quasi-random number generators).

Wolfram Language code: ListPlot[Take[CUDAMemoryGet[output], {1, 10000}]]

When complete, the memory must be unloaded.

Wolfram Language code: CUDAMemoryUnload[output, directions]

Hashing Random Number Generators

Random number generators that depend on hashing generate random numbers of lesser quality, but they generate them fast. For many applications, they are more than adequate.

Tiny Encryption Algorithm Hashing

The Tiny Encryption Algorithm (TEA) is a very simple hashing algorithm implemented in the following file.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}]

Load CUDAFunction.

Wolfram Language code: CUDATeaEncryption = CUDAFunctionLoad[{srcf}, "Tea", {{_Integer, _, "Output"}, _Integer}, 256]

This allocates memory for the output.

Wolfram Language code: mem = CUDAMemoryAllocate[Integer, {2048}]

This calls CUDAFunction.

Wolfram Language code: CUDATeaEncryption[mem, 2048]

This plots the result.

Wolfram Language code: ListPlot[CUDAMemoryGet[mem]]

This deletes allocated memory.

Wolfram Language code: CUDAMemoryUnload[mem]

MD5 Hashing

Other general hashing methods can be used for random number generators. Here is an implementation of the MD5 algorithm—a well-known hashing algorithm.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "md5_rand.cu"}]

This loads CUDAFunction from the source.

Wolfram Language code: md5 = CUDAFunctionLoad[{srcf}, "gen_randMD5", {{"Integer32[4]", _, "Output"}, _Integer, _Integer}, 32]

This loads the output memory.

Wolfram Language code: mem = CUDAMemoryLoad[ConstantArray[0, {1024 * 4}], "Integer32[4]"]

This calls CUDAFunction.

Wolfram Language code: md5[mem, 1024, 7, 1024]

This plots the results.

Wolfram Language code: ListPlot[CUDAMemoryGet[mem]]

This deletes allocated memory.

Wolfram Language code: CUDAMemoryUnload[mem]

Normal Random Numbers

The following algorithms generate normally distributed random numbers.

Inverse Cumulative Normal Distribution

The following implements a way to generate normally distributed random numbers.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}]

This loads CUDAFunction.

Wolfram Language code: CUDAInverseCND = CUDAFunctionLoad[{srcf}, "InverseCND", {{_Real, _, "InputOutput"}, _Integer, _Integer}, 256]

Allocate memory for 100,000 random numbers.

Wolfram Language code:

sampleCount = 100000;
mem = CUDAMemoryAllocate[Real, sampleCount];

This calls CUDAFunction.

Wolfram Language code: CUDAInverseCND[mem, sampleCount, 0]

This gets the memory into the Wolfram Language.

Wolfram Language code: samples = CUDAMemoryGet[mem];

This plots the result, using Histogram.

Wolfram Language code: Histogram[samples, Automatic, "ProbabilityDensity"]

This unloads the memory.

Wolfram Language code: CUDAMemoryUnload[mem]

Box–Muller

Box–Muller is a method of generating normally distributed numbers, given a set of uniformly distributed random numbers. The CUDA implementation is found in the following file.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}]

This loads CUDAFunction.

Wolfram Language code: CUDABoxMuller = CUDAFunctionLoad[{srcf}, "BoxMuller", {{_Real, _, "InputOutput"}, _Integer}, 128]

This sets the input arguments.

Wolfram Language code:

MTRNGCount = 4096;
PATHN = 2 ^ 14;
NPerRNG = Ceiling[PATHN / MTRNGCount];
NPerRNG = If[EvenQ[NPerRNG], NPerRNG, NPerRNG + 1];
RANDN = MTRNGCount * NPerRNG;

Use the Mersenne Twister (defined two sections ago) to generate a list of uniformly distributed random numbers.

Wolfram Language code: mem = MersenneTwister[RANDN]

Transform the list of uniform random numbers to normally distributed random numbers.

Wolfram Language code: CUDABoxMuller[mem, NPerRNG, MTRNGCount]

You can see the bell curve when using Histogram.

Wolfram Language code: Histogram[CUDAMemoryGet[mem], Automatic, "ProbabilityDensity"]

This deletes allocated memory.

Wolfram Language code: CUDAMemoryUnload[mem]

Applications of Random Number Generators

Random numbers have applications in many areas. Here two main applications are presented: Monte Carlo integration (by approximating and an arbitrary function) and simulating Brownian motion.

Approximating π

The value of can be approximated using Monte Carlo integration. First, generate uniformly random numbers in the unit square. Then the number of points inside the first quadrant of the unit circle is counted. The result is then divided by the number of points. This will give .

This implements reduction, counting the number of points in a unit circle.

Wolfram Language code: srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "reduceInCircle.cu"}]

This loads CUDAFunction.

Wolfram Language code: CountInCircle = CUDAFunctionLoad[{srcf}, "countInCircle", {{_Real, _, "Input"}, {_Integer, _, "Output"}, _Integer}, 256]

Use 1,000,000 points.

Wolfram Language code:

size = 1000000;
numThreads = 256;
numBlocks = Floor[(size + (numThreads * 2 - 1)) / (numThreads * 2)];

Generate the random numbers using the Mersenne Twister algorithms discussed previously.

Wolfram Language code: randomNumbers = MersenneTwister[2 * size]

This allocates the output memory.

Wolfram Language code: output = CUDAMemoryAllocate[Integer, numBlocks]

This performs the computation.

Wolfram Language code: CountInCircle[randomNumbers, output, size]

This gets the output memory.

Wolfram Language code: CUDAMemoryGet[output]//Total

The result agrees with the Wolfram Language.

Wolfram Language code: Select[Partition[CUDAMemoryGet[randomNumbers], 2][[ ;; size]], #[[1]] ^ 2 + #[[2]] ^ 2 ≤ 1&]//Length

The timing is considerably faster.

Wolfram Language code: CountInCircle[randomNumbers, output, size];//AbsoluteTiming

Compared to the Wolfram Language.

Wolfram Language code: Select[Partition[CUDAMemoryGet[randomNumbers], 2][[ ;; size]], #[[1]] ^ 2 + #[[2]] ^ 2 ≤ 1&];//AbsoluteTiming

Monte Carlo Integration

Monte Carlo integration finds its way into many areas. Here, Sqrt[x] from 0 to 1 is integrated.

Wolfram Language code:

src = "
__device__ Real_t integratedFunction(Real_t x) {
	return sqrt(x);
}
__global__ void monteCarlo(Real_t * evals, Real_t * randList, mint length) {
   	int index = threadIdx.x + blockIdx.x*blockDim.x;
   	if (index < length) {
			evals[index] = integratedFunction(randList[index]);
   	}
}
";

This loads the function.

Wolfram Language code: monteCarlo = CUDAFunctionLoad[src, "monteCarlo", {{_Real, "Output"}, {_Real, "Input"}, _Integer}, 256]

Use the Sobol quasi-random number generator for random numbers.

Wolfram Language code:

srcf = FileNameJoin[{$CUDALinkPath, "SupportFiles", "random.cu"}];
sobolFun = CUDAFunctionLoad[{srcf}, "Sobol", {_Integer, _Integer, {_Integer, _, "Input"}, {_Real, _, "Output"}}, {64, 1}];
vectorCount = 100000;
dimension = 100;
directions = CUDAMemoryLoad[Flatten[Import[FileNameJoin[{$CUDALinkExampleDataPath, "soboldirection.txt"}], "Data"]]];
random = CUDAMemoryAllocate[Real, vectorCount * dimension];
gridDim = {64, dimension};
sobolFun[vectorCount, dimension, directions, random, gridDim];

Then use the number of random number generators as the length.

Wolfram Language code: len = First["Dimensions" /. CUDAMemoryInformation[random]]

This allocates memory for the output.

Wolfram Language code: output = CUDAMemoryAllocate[Real, len]

This calls the function.

Wolfram Language code: monteCarlo[output, random, len]

This checks whether the first few elements make sense.

Wolfram Language code: CUDAMemoryGet[output][[ ;; 20]]

You now need to sum the output. This can be done using CUDAFold.

Wolfram Language code: CUDAFold[Plus, 0, output] / len

The result agrees with NIntegrate.

Wolfram Language code: NIntegrate[Sqrt[x], {x, 0, 1}]

Unload the allocated memory.

Wolfram Language code: CUDAMemoryUnload[output, random]

Brownian Motion

This allocates memory for the simulation.

Wolfram Language code: mem = CUDAMemoryLoad[RandomReal[{0.0, 1.0}, sampleCount]]

Wolfram Language code: CUDAInverseCND[mem, sampleCount, 1];

Wolfram Language code: samples = CUDAMemoryGet[mem];

The values of the pseudorandom sequence are normally distributed.

Wolfram Language code: Histogram[samples]

Wolfram Language code: brownianMotion = FoldList[Plus, 0, samples];

Wolfram Language code: ListPlot[brownianMotion]

Code Generation

Since CUDALink is integrated in the Wolfram Language, you can use Wolfram Language features like SymbolicC to generate CUDA kernel code. If you have not done so already, import CUDALink.

Wolfram Language code: Needs["CUDALink`"]

For this example, you need the SymbolicC package.

Wolfram Language code: Needs["SymbolicC`"]

This defines some common Wolfram Language constructs, translating them to their SymbolicC representation.

Wolfram Language code:

ClearAll[toSymbolicC]
ClearAll[x]
ClearAll[xx]
SetAttributes[toSymbolicC, {HoldAll}]
toSymbolicC[x_List] := toSymbolicC /@ x
toSymbolicC[Times[-1, x_]] := "-" <> ToCCodeString[toSymbolicC[x]]
toSymbolicC[(op : (Plus | Times))[args___]] := COperator[op, toSymbolicC[{args}]]
toSymbolicC[(op : (Minus | BitNot | Not | Decrement | Increment | PreDecrement | PreIncrement))[x_]] := COperator[op, toSymbolicC[x]]
toSymbolicC[(op : (Mod | Divide | Subtract | BitShiftRight | BitShiftLeft))[x_, y_]] := COperator[op, {toSymbolicC[x], toSymbolicC[y]}]
toSymbolicC[(op : (ArcCos | ArcSin | Ceiling | Cos | Cosh | Exp | Abs | Floor | Sin | Sinh | Sqrt | Tan | Tanh | Log))[x_]] := CStandardMathOperator[op, toSymbolicC[x]]
toSymbolicC[Power[x_, r : Rational[_, _]]] := CStandardMathOperator[Power, {toSymbolicC[x], toSymbolicC[r]}]
toSymbolicC[Power[x_, 2]] := COperator[Times, {toSymbolicC[x], toSymbolicC[x]}]
toSymbolicC[Power[x_, y_]] := CStandardMathOperator[Power, {toSymbolicC[x], toSymbolicC[y]}]
toSymbolicC[CompoundExpression[stmts__]] := toSymbolicC /@ stmts
toSymbolicC[If[cond_, trueStmt_]] := CIf[toSymbolicC[cond], toSymbolicC[trueStmt]]
toSymbolicC[If[cond_, trueStmt_, falseStmt_]] := CIf[toSymbolicC[cond], toSymbolicC[trueStmt], toSymbolicC[falseStmt]]
toSymbolicC[x_Rational] := N[x]
toSymbolicC[x_] := x

To test, pass in a Wolfram Language statement and get the SymbolicC output.

Wolfram Language code: toSymbolicC[Sin[x] ^ 3 + x ^ 8 + 3]

To convert to a C string, use the ToCCodeString method.

Wolfram Language code: ToCCodeString[%]

The above allows you to write a function that takes a Wolfram Language function (pure or not) and would generate the appropriate CUDA kernel source.

Wolfram Language code:

SetAttributes[CUDAMapSource, {HoldAll}];
ClearAll[CUDAMapSource];
CUDAMapSource[f_] := ToCCodeString[With[{fun  = f[xx] /. xx -> CArray["lst", "index"]}, 
	SymbolicCUDAFunction["map", {{CPointerType[{"Real_t"}], "lst"}, {"mint", "length"}}, 
	CBlock[{
	SymbolicCUDADeclareIndexBlock[1], 
	CIf[COperator[Less, {"index", "length"}], 
	CAssign[CArray["lst", "index"], toSymbolicC[fun]]
	]
	}]
	]]]

Passing a pure function to CUDAMapSource returns the kernel code.

Wolfram Language code: CUDAMapSource[# + Sin[#]&]

This defines a function that, given a Wolfram Language function and an input list, generates the CUDA kernel code, loads the code as a CUDAFunction, runs the CUDAFunction, and returns the result.

Wolfram Language code:

SetAttributes[myCUDAMap, HoldFirst];
myCUDAMap[fun_, input_List] := 
	Module[{len = Length[input], oclFun}, 
	oclFun = CUDAFunctionLoad[CUDAMapSource[fun], "map", {{_Real, "InputOutput"}, _Integer}, 256];
	First[oclFun[input, len]]
	]

You can test myCUDAMap with a pure function that adds 2 to each element in an input list.

Wolfram Language code: myCUDAMap[# + 2&, ConstantArray[1.0, 100]]

Any construct translated by toSymbolicC is supported by myCUDAMap. Here, each element is squared.

Wolfram Language code: myCUDAMap[# ^ 2&, Range[100]]

This performs Monte Carlo integration.

Wolfram Language code: CUDAFold[Plus, 0, myCUDAMap[# ^ 2&, RandomReal[1, 1000000]]] / 1000000

Functions can be defined and passed into myCUDAMap. Here, a color negation function is defined.

Wolfram Language code: colorNegate[x_] := 1.0 - x

Invoke the color negation function.

Wolfram Language code: myCUDAMap[colorNegate, Range[10]]

The above is a simple example, but can be used as a seed for more complicated ones.

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

Applications

Image Processing

Image Binarize

Box Filter

Image Adjust

Canny Edge Detection

Linear Algebra and List Processing

Matrix Transpose

Matrix-Vector Multiplication

Matrix-Matrix Multiplication

Dot Product

Convex Hull

Random Number Generation

Pseudorandom Number Generators

Park–Miller

Mersenne Twister

Quasi-Random Number Generators

Halton Sequence

Sobol Sequence

Hashing Random Number Generators

Tiny Encryption Algorithm Hashing

MD5 Hashing

Normal Random Numbers

Inverse Cumulative Normal Distribution

Box–Muller

Applications of Random Number Generators

Approximating π

Monte Carlo Integration

Brownian Motion

Code Generation

Applications

Image Processing

Image Binarize

Box Filter

Image Adjust

Canny Edge Detection

Linear Algebra and List Processing

Matrix Transpose

Matrix-Vector Multiplication

Matrix-Matrix Multiplication

Dot Product

Convex Hull

Random Number Generation

Pseudorandom Number Generators

Park–Miller

Mersenne Twister

Quasi-Random Number Generators

Halton Sequence

Sobol Sequence

Hashing Random Number Generators

Tiny Encryption Algorithm Hashing

MD5 Hashing

Normal Random Numbers

Inverse Cumulative Normal Distribution

Box–Muller

Applications of Random Number Generators

Approximating π

Monte Carlo Integration

Brownian Motion

Code Generation

Related Guides

Related Tech Notes