Perlin noise is a common algorithm used to generate procedural textures. This is a textbook implementation of the noise function:
This loads the

function:
This generates the permutation table used in the noise algorithm:
This sets the width and height. It also allocates the output memory:
This defines the parameters to the Perlin noise:
This calls the Perlin noise function. The output is a
CUDAMemory handle:
The memory is retrieved from the GPU and displayed as an image:
Putting the result in
Manipulate, you can see the output as parameters change:
With Perlin noise, you can create procedural landscapes. Define the width, height, and allocate memory for the landscape:
The parameters used define the landscape:
The data is retrieved and some image processing functions are used to smooth the elevation map:
The result is similar to a mountain range:
This deallocates the memory:
Varying parameters to the noise results in difference patterns. Here, wood texture is created:
The following are known parameters for wood:
This defines a helper function that recolors the grayscale image:
Here, the wood texture is generated:
As before, the result can be plotted onto a surface:
The original source code defines more noise functions. This loads all functions:
Here,
Manipulate is used to showcase the different noise functions:
This deallocates the memory:
The histogram algorithm places elements in a list in separate bins depending on their values. The following implements a histogram that places values between 0 and 255 in separate bins:
This loads the two CUDA kernel functions:
This gets sample data. An image is chosen in this case, and the
ImageData is flattened:
The algorithm requires some temporary data that would be used as intermediate histograms:
This computes sub-histograms and places them in the intermediate list generated before:
This merges the temporary histograms:
This gets the output histogram:
This unloads the temporary memory. Failing to do so results in a memory leak:
This plots the output histogram:
The scan, or prefix sum, algorithm is similar to
FoldList and is a very useful primitive algorithm that can be used in a variety of scenarios. The CUDA implementation is found in the following location:
This loads the three kernels used in computation:
This generates random input data:
This allocates the output buffer:
This computes the block and grid dimensions:
A temporary buffer is needed in computation:
This performs the scan operation:
This retrieves the output buffer:
Minus the first term, the result agrees with
FoldList:
This deallocates the
CUDAMemory elements:
The reduction kernel is similar to
Fold in
Mathematica because it reduces a list given a binary operation. Whereas
scan kept the previous elements in the computation,
reduce discards them. This loads the reduction
CUDAFunction:
This sets the input and output buffers:
This performs the computation:
Each block reduces 512 elements of the list; therefore, you need multiple calls to reduce lists larger than 512 elements. This list is small, so no loop is necessary. This gets the output memory from the previous step, assigns the memory in

to

, and frees

:
This allocates a new output buffer:
This performs a second reduction:
The output is retrieved and the output buffer is unloaded:
The result agrees with Mathematica:
The following implements a color converter, converting from RGB color space to HSB. The CUDA implementation is in the file:
This loads the

function from the source file:
This sets the input image along with the input parameters:
This allocates memory for the output:
This converts the image to HSB space:
By default,
Image views the data in RGB space. This results in wrong output:
Use
ColorSpace
to get proper output:
The following code implements the Caesar cipher. The Caesar cipher is a simple cypher that adds the value 3 to each character in the text. Here is the CUDA implementation:
This loads some example text; the Declaration of Independence is loaded in this case:
Here, the function is loaded from the code string:
This calls the CUDA function and displays only the first 100 characters of the output:
The following implements a moving average:
This loads the
CUDAFunction defining the macro

as

:
This defines the input parameters and allocates memory for the output:
This calls the
CUDAFunction:
This gets the output memory:
Memory is unloaded:
The Black-Scholes formula is a commonly used formula used in financial computation.
CUDALink provides
CUDAFinancialDerivative, which can compute financial options. To demonstrate how it is written, implement as a simple version:
This loads the
CUDAFunction. Set the

to

, which means that

is interpreted as

:
This assigns the input parameters:
This calls the function:
This gets the output memory:
This unloads allocated memory: